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Abstract 

This paper considers the problem of comparing two processes with 
panel data. A nonparametric test is proposed for detecting a monotone 
change in the link between the two process distributions. The test statis- 
tic is of CUSUM type, based on the empirical distribution functions. The 
asymptotic distribution of the proposed statistic is derived and its fi- 
nite sample property is examined by bootstrap procedures through Monte 
Carlo simulations. 
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1 Introduction 

Many situations lead to the comparison of two random processes. In a paramet- 
ric case, the problem of change detection has been widely studied in the time 
series literature. A common problem is to test a change in the mean or in the 
variance of the time series by using a parametric model (see for instance [5] or 
[7] , and references therein) . In the Gaussian case comparisons of processes are 
considered through their covariance structures (see [9], [H])- These distribution 
assumptions can be relaxed when the study concerns processes observed through 
panel data. This situation is frequently encountered in medical follow-up stud- 
ies when two groups of patients are observed and compared. Each subject in 
the study gives rise to a random process (X t ) denoting the measurement of the 
patient up to time t (such data are referred to as panel data). In this context, 
[3 [H Q] considered the problem of testing the equality of mean functions and 
proposed new multi-sample tests for panel count data. 

In this paper we consider the general problem of comparison of two pro- 
cesses which may differ by a transformation of their distributions. Our purpose 
is to test whether this transformation changes over time. For this, two panels 
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are considered: (Xi it )i<i<N x ;i<t<n an d (Y i>t )i<i<N y ;i<t<n, not necessarily inde- 
pendent; that is, we can have i.i.d. paired observations (Xi, Yi)j=i,... ,jv with de- 
pendence between Xj, and Yi. It is assumed that for each t, the X^t, 1 < i < N x 
(resp. Yjj, 1 < i < iV y ) are i.i.d. random variables with common distribution 
function F t (resp. Gt) and with support X (resp. 30- Also we assume that 
for all 1 < t < n there exists monotone transformations h t such that the fol- 
lowing equality in distribution holds: X t — ht(Y t ). Without loss of generality 
we consider that the functions ht(-) are increasing. Note that if F t is invertible 
then there exists a trivial transformation h t given by h t — Ff 1 o Gt- We are 
interested in testing whenever this transformation is time independent; that is, 
for all t, the equality h t = h occurs. A simple illustration is the case where X t 
and Y t are Gaussian processes with mean mx and my and variance ta\ and 
tay, respectively. In that case the function h is linear. 

More generally, observing both processes X t and Y t with panel data we want 
to test 



It is clear that Hq coincides with the equality in distribution: X t — (h(Y t )), for 
all t. Following |8] (see also [7]), we construct a non parametric test statistic 
based on the empirical estimator of ht, denoted by ht. We show that ht is 
proportional to a Brownian bridge under Hq- 

When Hq is not rejected, it is of interest to estimate h and to interpret 
its estimator h. Then this test can be viewed as a first step permitting to 
legitimate estimation and interpretation of a constant transformation h between 
the distributions of two samples, possibly paired. 

The paper is organized as follows: In Section 2 we construct the test statistic. 
In Section 3 we perform a simulation study using a bootstrap procedure to 
evaluate the finite sample property of the test. The power is evaluated against 
alternatives where there are smooth scale or position time changes in the process 
distribution. Section 4 contains brief concluding remarks. 

2 The test statistic 

A natural nonparametric estimator of h t is given by 



where -X"(»),t denotes the ith order statistic and Gt is the empirical distribution 
function of (li,t)i<i<iv B , that is 



Hq : yt, h t = h against Hi : 3t\ ^ t%, h t 



MO = x. 



(W„G t (.)),i 
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A nonparametric test is considered to test the variation of h t . For r e (0,1), 
x ^ y, write 

1 /'™ T ' [ 1 ™ \ 

s " (T ' x) = ^ &W-T&(4 (!) 

v ncr « y t= i n t= i y 

where 

1 n i n 

^ = -Ete-M^^^Ew 

For a given square integrable function w we define the following test statistic 



S n (w) = / w(x) sup |B n (r, a;)| dx. 
Jm 1<t<1 

To establish the limiting distribution of the statistic £„ (w) under the null, we 
need the following assumptions: 

• Assumption 1. There exists a < oo such that N X /(N X + N y ) — > a. 

• Assumption 2. There exist 71 > and 72 > such that ftix) > 71 and 
9t(y) > 72 for all (a;, y) e A" x y, where ft and g t are the density functions 
of X t and Y t . 

• Assumption 3. For all x e X, there exist < < 00 such that 
1 " 

— E <7 i,t( a; ) ~ ^2( x )i as f* ^ 00, 



n • 
t=i 



where 



a 2 (x) ^( X ) N * +N y and ^ (x) _ Gt(x)(l-G t (x)) 



Assumption 4. 



n(N x +N y ) 
N x N y 



Remark 2.1 Assumptions 1 and 2 are standard. Assumption 3 states that the 
second moments converge on average. If Assumption 1 is satisfied, Assumption 
4 is equivalent to n = o(N x ) or n — o(N y ). 

Theorem 2.1 Let assumptions 1-4 hold. Then under the null Hq we have the 
following convergence in distribution 

S n (w) A-S(w) = Boo / w(x)dx, as n — > 00, N x — > 00 and N y — > 00, (3) 
where Boo = su Po<t<i l-S( T )l; an d B is a Brownian bridge. 
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Remark 2.2 The cumulative distribution function of is given by (see JJjj) 

oo 

F B Jz) = l + 2^(-l) fe cxp{-2fc 2 z 2 }. 

fe=i 

Before proving Theorem 1, we state three lemmas. 
Lemma 2.1 Under Assumption 1 we have 

N X Ny ^ 



N X + Ny 



J (h t (a;) -h t (x)) AN(0,<j 2 {x)), as N x -)■ oo, iVj, -»■ oo (4) 



where aj{x) is given by (0]. 

Proof (1{Y"» ( <x}) is an i.i.d sequence with mean Gt{x) and variance Gt(x)(l — 
Gt{x)), hence an immediate application of the central limit theorem yields 

N^ 2 (& t {x) - G t {x)) -4 N(0,G t (x)(l-G t (x))). (5) 

By the delta-method the last convergence implies that 

N'/ 2 (F-i(G t (x))-F-i(G t (x))) -4 N(0,a 2 {x)). (6) 

For p g]0; 1[ fixed, denote by F^ 1 ^) the sample p-quantile; that is, F i ~ 1 (p) = 
-X7 r w, where r = [N x p] + 1. By Theorem 3 of [12] we obtain 

Nl/\F t ~\p) - F t -\p)) AN U i/i-ifl, ) . e (0. !)• (7) 

Let <px(t) = E(exp(iiX)) denotes the characteristic function of the random vari- 
able X and let <f>x\Y{t) = E(exp(itX) | Y) denotes the conditional characteristic 
function of the random variable X conditional on Y . We have 



N X N V 



1/2 



where 



" = I^IU - h,{x)) 

^"'(f^wi-fr'i*))) 

- (a^) 1/2w »' /2 ( f "< S 'W»- f "< g 'W»)- 
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Then we get 



E(exp(iuH t )) 

E (e exp(iuH t ) | Y t 



= E (exp( 



tuH 2 , t ) E 



exp(iu7J M ) | Y t 



Moreover 



E 



exp(mi?i jt ) | Y t 



(8) 

= ^Nl'\F r \G t{x ))-F r \G t { X )))\Y t ((Ny/(N X + N y )) 1/2 U 

From © it follows that, Vu € K, 

1 



exp 



< f ) N 1 J 2 (F- 1 (G t (x))-F- 1 (G t (x)))\YS V ^ 

as JVjc — > oo, where 

G t (x)(l-G t (x)) 



^(z) 



(9) 



f?(Ff\d t (x)) 



The convergence (JSJ yields G t (x) P ^ G t {x), as N y — > oo, which implies, 
combined with ((8|)-((9]), Assumption 1 and = F t ~ 1 (Gt(a:)), that 



E 



exp(iuH ht ) \ Y t exp I --(1 - a)u a t (x) 1 , 



(10) 



as iVa, — ► oo and N y — > oo. Moreover we have 



exp(iuH2,t ) = exp 



in 



N,, 



N x + N v 



1/2 



Since the function x i— > exp(mx) is continuous, then the convergence ([6]) and 
Assumption 1 yield 

exp(iuH2.t ) 4 exp(iua 1 ^ 2 H2.t ), as — > oo,N y —> oo, (11) 

where iJ2,t is centered Gaussian distributed with variance equal to of (cc)). From 
(fTU|) and (fTTI) it follows that, as N x — > oo and — > oo, 

exp(mi?2,t )E exp(iuHi it ) \ Y t 

4 exp(ma 1/2 ir 2 ,t ) exp ( - a)u 2 o- 2 (a;) j . (12) 
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Since E 



exp(ra7Ji jt ) | Y t and expfro^.t) are bounded almost surely, it follows 



from O that 



$H t ( u ) = 7& [cx-p(iuH 2 ,t )E exp(iuHi. t ) \ Y t 

-t E (exp [iua 1 / 2 H 2 ,t^ exp f (1 - a)u 2 cr 2 (x) 



as AL 



oo, N v — > oo 



exp 
exp 



au 2 a 2 (x) I exp ( — (1 - a)u 2 a 2 {x) 



'^u 2 a 2 t {x) 



therefore the desired conclusion (|4| holds. 
Lemma |2. II implies that 

h t (x) = h t (x) + aij(x)e t + r t , 



(13) 



where <r\ t (x) is given by ([2]). (e t ) is a standard Gaussian white noise and the 
remainder term r t is such that 



r t = P ({(N x +N x )/N x N y } 1/2 y 



(14) 



Let D = D[0, 1] be the space of random functions that are right-continuous and 
have left limits, endowed with the Skorohod topology. The weak convergence of 
a sequence of random elements X n in D to a random element X in D will be 
denoted by X n =>■ X. Let 



[nr] 

W„(t) = —=Y,<TiAz)et, re [0,1]. 



(15) 



(16) 



Lemma 2.2 Under Assumptions 1-3 we have 

where W stands for the standard Brownian motion. 
Proof Assumption 2 implies that 

< c, 

for some positive constant C and N x and Af y large enough. Hence a\ t (x) is 
a bounded deterministic sequence, therefore the weak convergence (|16[) follows 
from Theorem A.l of [5]. 
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Lemma 2.3 Under the null H$, as n — > oo, N x — > oo and N y — > oo we have 

1 - ~ 

al = -J2(ht(x)-h(x)f d a%{x). (17) 



n 
t=l 



Proof Under the null Hq: h t (x) — h{x) the equality (fTB")) becomes 

h t (x) = /i(a;) + cri it (x)e t + r t . 

Let yt = h{x) + <J\,t{x)Et , y = Vt/ n i then by using the same argument 

as in Theorem 1 of [5] we obtain 

1 n 

We have 

i ™ _ 

-^(/^(^-Mx)) 2 

t=l 

n 1 71 1 n 

= -I>*-y) 2 + -E( r *- F ) 2+2 -E^-y)( r *- F )' ( 19 ) 

n * — ' n z — ' n z — ' 

t=i t=i t=i 

where r = J2t=i r t/ n - From (fT4|) it follows that 

r = Op(((A^+Ag/A^Ag 1/2 ) 
= o p (l), as N x oo, N y oo, 

which implies that 
1 " 

-^Z^t-r) 2 = o p (l), as JV X -> oo, JV„ -> oo. (20) 
n t=i 

By using the Cauchy Shwartz inequality, we have 

t=i \ t=i / \ t=i ) 

Hence by using ([l8]l and (|20|) we get 
1 ™ 

-V)( r t ~r) = o p (l), as N x -> oo,iV a -> oo. (21) 

t=i 

The desired conclusion (fl7|) holds by combining (|T8|) - (j 2 lj) . 
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Proof of Theorem 1 Under the null, the process B n (r,x) in ([T]) can be 

rewritten as 

1 ( [nT] Wit] " \ 
B n (T,x) = —~r- y^vi,t{x)st y~]ait{x)£ t +R n (T,x) 

^ \ti n ti ) 

= (w n (r) - Mw^ij) + i?„( r , X l 
where the remainder term R n (r,x) is given by 

R n {T,x) = 



1 /[ nr l r i ™ \ 



Now observe that 



[nr] 

$> t = Op([nT]((N x +N Y )/N x N Y ) 1/2 
t=i 

which together with p7|) implies that 

= Op(l) under assumption 4. 

Hence 

Rn(T,x) = 1- (V„(r) - ^lw n (l)) +Op(l), 

c„ \ n / 

which combined with (fT())) and (jTTJ) yields 

fl n (.,x) =>■ fl, 
where B(t) = W(t) — rW(l) is a Brownian bridge. Therefore 

sup \B n (r,x)\ -4 sup |S(r)|. (22) 

l<r<l l<r<l 

Let F(R, M) be the space of square integrable functions endowed with the uni- 
form norm H-H^ . For a given square integrable function w, the functional Q w : 
(F(m,R),||.|| 00 )^(R,|-l) defined by 

Gw(g) = / w(x)g(x)dx, 
Jr 

is continuous. To obtain the convergence ([3]) it is sufficient to apply ([22]) and 
the continuous mapping theorem. 
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3 Empirical study 



For simplicity we consider N x = N y = N. Data are generated from three 
models: first, Y t is normally distributed with mean and variance 1, and X t 
is generated independently by the transformation X t = h t (Z t ), where Z t is 
another Gaussian process with mean and variance 1. Second, Y t is an autorc- 
gressive process of order 1 (AR1) with correlation coefficient equal to 0.5, and 
X t is generated independently by the transformation X t = ht(Z t ), where Z t is 
another AR1 process. For the last model random variables are paired: Y t are 
independent Gaussian variables with mean and variance 1, and X t — h t (Y t ), 
that is, the time transformation is on the random variables. It is clear that this 
implies the same transformation for the corresponding distributions. 

Alternatives. The following five alternatives are considered 
First alternative: Al 

2t 2 

Change in the mean, hi Ax) = tj + x. 

1 + 1 

Second alternative: A2 

2t 2 

Change in the variance, hi t(x) = ~ x. 

1 + t z 

Third alternative: A3 

Jump. h 3:t (x) = x + 0.05fl[ t<n/ 2 + 0.005(rt - i)I t >„/ 2 , 
where \> n /2 = 1 if t > n/2 and otherwise. 

Fourth alternative: A4 

Smooth change in the mean. hi,t{x) = x + (1 + exp(— 0.01(i — 
Fifth alternative: A5 

Smooth change in the mean. h^ >t (x) = x + (1 + cxp(— 0.05(t — 1))) _1 

All alternatives are smooth and are less rough than classical rupture on the 
mean or on the variance, except A3 which coincides with a jump on the mean. 
The first two alternatives A1-A2 tend quickly to the null model under Ho when 
the length n increases. Figured] illustrates the proximity of ht to a constant for 
large times length in the case of alternative Al. In opposition, alternatives A4- 
A5 are very smooth and converge slowly to the null model. Figure [5] illustrates 
this smooth convergence under alternative A4. 
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(a) Time length = 20 (b) Time length = 200 

Figure 1: Representation under alternative Al of h t = 2t 2 /(l + t 2 ) for time 
length = 20 (a) and time length = 200 (b) 




(a) Time length = 20 (b) Time length = 200 

Figure 2: Representation under alternative A4 of h t — (l+exp(— 0.01(t — l) 2 )) 
for time length = 20 (a) and time length = 200 (b) 



Bootstrap procedure. To evaluate the power of our testing procedure we 
first consider a Monte Carlo statistic. Given M points x\,--- ,% in J 7 we 
consider 

I M 

S M (w) = — ^w(x l )A(x l ), (23) 

i=l 

where 



A(xi) = max 

l<k<n 
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with 



( 1 n 

= -]T(Mx)-Mx)) 2 
1 - 

n ' 

The convergence of the statistic Sm{w) is not guaranteed since the j4(a;,) are 
dependent. To carry out this problem, a bootstrap procedure is proposed. 
We construct a naive bootstrap statistic; that is, the test statistic Sm(w) 
given in (|23|) is compared to the empirical bootstrapped distribution obtained 
from (jSm )&=i,— ,b, with constructed from the bootstraped sample drawn 
randomly with replacement and satisfying the size equalities N* = N x and 
N* = N y . We fix w as a constant. Note that if X and Y are paired, the boot- 
strap procedure consists in drawing randomly with replacement N pairs (X, Y) 
from the data. We fix B = 200 bootstrap replications. 



Powers. For each alternative, the test statistic is computed, based on sample 
sizes N = 50, 100, for a theoretical level a = 5%. The lengths of time's intervals 
are n = 20, 100 and 200; that is, the function ht is observed N times for each t 
varying in [0; 20], or [0; 100], or [0; 200], with a step equal to one. The empirical 
power of the test is defined as the percentage of rejection of the null hypothesis 
over 10000 replications of the test statistic under the alternative. 

Figure |3] presents empirical powers of the bootstrap test for all alternatives, 
in the case where X t are independent standard Gaussian variables. Solid lines 
and dotted lines correspond to N = 50 and 100 respectively. It can be observed 
that the power decreases with the length for alternatives Al and A2. It is 
in accordance with the previous remark: h t is close to the null hypothesis for 
relatively large values of n. Then passing from a time length equal 20 to a 
time length equal to 200 corresponds to adding variables with nearly constant 
transformation in distribution (see Figure [1]). 

Alternatives A4-A5 have similar behaviors, with a power increasing with 
n. It can be explained by the very slow convergence to the null model. Here, 
passing from a time length equal 20 to a time length equal to 200 corresponds 
to adding new observations with a time depending transformation (see Figure 
©. 

It is also observed that power associated to alternative A3 increases with n. 

In Figure 0] empirical powers are presented in the case where Y t follows an 
AR1 process with a correlation coefficient equal to 0.5. Here powers are slightly 
better and more stable with respect to the length. This is due to the correlation 
inducing more stability of the process Y t and permitting a better estimation of 

/it- 
Figure [5] presents results in the case of paired data, with Y t normally dis- 
tributed. Powers are good, due to the fact that transformations occur not 
randomly since we have considered X t = h t (Y t ). Then ht can be efficiency 
estimated and its variations are well detected. 
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Figure 3: Empirical powers for alternatives Al (•) and A2 (o) on the left, A3 
(o), A4 (A) and A5 (V) on the right, with X t distributed as Af(0, 1). Solid lines 
correspond to N = 50 and dotted lines correspond to N = 100. The lengths of 
time's intervals are n = 20, 100, 200 
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Figure 4: Empirical powers for alternatives Al (•) and A2 (o) on the left, A3 
(o), A4 (A) and A5 (V) on the right, with X t following an AR1 process with 
correlation 0.1. Solid lines correspond to N = 50 and dotted lines correspond 
to N — 100. The lengths of time's intervals are n = 20, 100, 200 
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Figure 5: Empirical powers for alternatives Al (•) and A2 (o) on the left, A3 (o), 
A4 (A) and A5 (V) on the right, with X t and Y t paired. Solid lines correspond 
to N = 50 and dotted lines correspond to N = 100. The lengths of time's 
intervals are n = 20, 100, 200 



4 Concluding remarks 

The proposed method concerns the comparison of two processes when panel 
data are available. The test permits to detect a change in the relation between 
the two process distributions. Therefore it can detect a change in a higher 
moments (not only in the mean and / or in the variance as almost tests do in this 
framework) . The asymptotic distribution of the proposed statistic was derived 
under the null of no change in the relation between the two process distributions. 

The Monte Carlo simulations show that our test performs well in finite sam- 
ple and has a good power against either abrupt or smooth changes. It is also 
valid for paired processes and then it can be used to detect a change in ht in the 
relation X t — h t (Y t ) (see the paired case in our simulations). The test can also 
be used as a first step permitting to legitimate estimation and interpretation 
of a constant transformation h between two panel data, as for instance in a 
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medical follow-up study. 

A direction for future research is to consider a d-sample comparison of dis- 
tributions, for d > 2, in the way of [31 [5]. Another direction should consider 
multivariate distributions. 
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